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Abstract 

An approximate sparse recovery system in £i norm makes a small number of measurements of a 
noisy vector with at most k large entries and recovers those heavy hitters approximately. Formally, it 
consists of parameters N, k, e, an m-by-A^ measurement matrix, and a decoding algorithm, T). Given 
a vector, x, where denotes the optimal fc-term approximation to x, the system approximates x by 
X = ©(^x), which must satisfy 

||x-x||i < (l + e)||x-Xfc||i. 

Among the goals in designing such systems are minimizing the number m of measurements and the 
runtime of the decoding algorithm, T). We consider the "forall" model, in which a single matrix 
possibly "constructed" non-explicitly using the probabilistic method, is used for all signals x. 

Many previous papers have provided algorithms for this problem. But all such algorithms that use the 
optimal number m = 0(k log(A^/fc)) of measurements require superlinear time ri(A log(A^/ A:)). In this 
paper, we give the first algorithm for this problem that uses the optimum number of measurements (up 
to constant factors) and runs in sublinear time o(A) when k is sufficiently less than N . Specifically, for 
any positive integer i, our approach uses time 0{t'e~^k{N /kY/') and uses m = 0[l^e^^k log(A/fc)) 
measurements, with access to a data structure requiring space and preprocessing time 0{£Nk'^-'^ /e). 
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1 Introduction 



1.1 Description of Problem 

Variations of the Spaise Recovery problem aie well-studied in recent literature. A vector (or signal) x is 
first measured, by the matrix-vector product fi = $x, then, at a later time, a decoding algorithm V ap- 
proximates X from fi. The approximation is non-vacuously useful if x is dominated by a small number 
of large magnitude entries, called "heavy hitters." Applications arise in signal and image processing and 
database, with further application to telecommunications and medicine fD DT+OSl ILDP07il . Several work- 
shops [CA09. iSPA09ll have been devoted to this topic. See more at |Ric06|. 

In this paper, we focus on the following variation. If N is the length of the signal. A; is a sparsity 
parameter, and e is a fidelity parameter, we want ||x — x||]^ < (1 + e) ||xfc — x||^, where x^ is the best 
possible A;-term representation for x. Among the goals in designing such systems are minimizing the number 
m of measurements and the runtime of the decoding algorithm, V. We consider the "forall" model, in which 
a single matrix possibly "constructed" non-explicitly using the probabilistic method in polynomial time 
or explicitly in exponential time, is used for all signals x. 

1.2 Advantages over Previous Work 

Previous measurement-optimal algorithms are slow. Many previous papers have provided algorithms 
for this problem. But all such algorithms that use the constant-factor-optimal number 0(A; log(A^/A;)) of 
measurements require superlinear time Q,{Nlog{N/k)). In this paper, we give the first algorithm that, 
for any positive integer i, uses ^'^^^^e^^A; log(A^/A;) measurements {i.e., 0{klog{N/k)) measurements for 
constant i and e) and run in time i^^^^ e~^k{N / k)^/^ . For example, with £ = 2 and e = 17(1), the run- 
time improves from to VkN. In some applications, sparse recovery is the runtime bottleneck and our 
contribution can make some other Q{N) computation become the new bottleneck. 

The sublinear runtime of our algorithm is important not because traditional algorithms are too slow, but 
because the measurement-optimal algorithms that replaced them are too slow. Consider an application in 
which k <^ N. A traditional approach makes exactly N direct measurements or (in some cases) requires 
little more than taking a single Fast Fourier Transform of length N. Optimized code for FFTs is so fast that 
one cannot plausibly claim to lower the runtime, say from log A^ to Vn, by a complicated algorithm with 
heavy overhead. But, in some cases (see below), taking more measurements than necessary is a significant 
liability. Several papers in the literature (see Table [T]) improve the number of measurements from A^ to 
k log{N/k), but only with significant increase in the runtime, say, from computing a Fourier Transform to 
solving a linear program or, more recently, performing a combinatorial algorithm on expander graphs, of 
a flavor similar to our approach below. When k is small compared with A^, we hope that (i) the number 
0{k log{N/k)) of measurements made by our algorithm is significantly less than A^ in practice, and (ii) the 
sublinear runtime of our algorithm is significantly faster than that of other measurement-optimal algorithms, 
all of which use time Q.{N \og{N / k)) , and many of which have significant overhead. We do not expect that 
our "sublinear time" algorithm will compete on time with naive time 0{N) algorithms or with a single FFT, 
except for in unusual circumstances and/or values of k and A^. 

These questions have been actively studied by several communities. See Table [T] which is based in part 
on a table in iflROSl . 

Trading runtime for fewer measurements in sublinear-time algorithms. Previous sublinear-time algo- 
rithms for this problem have used too many measurements by logarithmic factors, which we now argue is 
inappropriate in certain situations. In a traditional approximation algorithm, there is an objective function 
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to be minimized, and relatively small improvements in the approximation ratio for the objective function — 
from 0(log n) to constant-factor to (1 + o(l)) — are considered well worth a polynomial blowup in compu- 
tation time. In the sparse recovery problem, there are two objective functions — the approximation ratio by 
which the error ||x — x||^ exceeds the optimal, and the number of measurements. In case of medical imag- 
ing, the number of measurements is proportional to the duration during which a patient must lie motionless; 
a measurement blowup factor of "1000 times log of something" is unacceptable. In this paper, we reduce 
the blowup in number of measurements to a small integer constant factor. 

Previous sublinear time algorithms had runtime polynomial in klog{N), often linear in k. By con- 
trast, our algorithm gives runtime e^'^^/kN or, more generally, gives runtime l^^^^e^^k{N/k)^/^ using 
£0(i)g-3^^Qg^jY/fc) measurements, which remains slightly suboptimal. But, first, a blowup in runtime 
from, say, k log^ N to k{N/k)^/^ is appropriate to reduce the approximation ratio from logarithmic to small 
constant in a critical objective Uke number of measurements in certain applications. Second, the blowup 
is not that big in other apphcations, where, say, {N/ky^^ is not much bigger than log^(A^). Alterna- 
tively, a parametric sweet spot for our algorithm occurs around k = N^/^. Putting £ = 3, we get runtime 
k{N/kY^^ = \/]V = fc^. This is about the time to multiply a vector by a dense matrix of smallest useful 
size, which is a tiny component in some (early) algorithms in the literature, superlinear or sublinear. 
Constant factor gap in number of measurements. The best previous superlinear algorithms IIRV06i use 
a number of measurements that is suboptimal by a small constant factor versus the best known lower bounds. 
Thus, for sublinear-time algorithms, a small constant-factor gap, rather than an approximation scheme, is 
currently an appropriate goal. 

All signals or Each signal? The results of this paper are in the "forall" model. Recently, a sublinear- 
time, constant-factor-optimal measurement algorithm was given BGPLSlOl in an incomparable setup. In 
particular, its guarantees were for the weaker "foreach" model, in which a random measurement matrix 
works with each signal, but no single matrix works simultaneously on all sig nalsQ The stronger forall model 
is more appropriate in certain applications, where, for example, there is a sequence x^^^ , x^^) of signals 
to be measured by the same measurement matrix, and x^^) depends, in some subtle way, on the result of 
recovering x^^^. (For example, an adversary may construct x*^^^ after observing an action we take in response 
to recovering x^^).) In the forall model, there is no issue. In the foreach model, however, it is important that 
an adversary pick the signal without knowing the outcome If the adverary knows something about the 
outcome $ — such as observing our reaction to recovering x^^) from $x(^) — the adversary may be able to 
construct an x^^^ in the null space of which would break an algorithm in the weaker foreach model. 

1.3 Overview of Results and Techniques 

First, following previous work BGPLSlO l. we show that it suffices to recover all but approximately k/2 of 
k heavy hitters at a time. The cost for this in measurements is cklog N/k, for some constant c. We then 
repeat on the remaining k' = k/2 heavy hitters, with cost ck' log N/k' » ^ck log N/k, and leaving k/4 
heavy hitters. Continuing this way, the total cost is a geometric progression with sum 0{klogN/k). In 
fact, we will use somewhat more than ^ck' log N/k' measurements, e.g., j^ck' log N/k' measurements, to 
enforce other requirements while still keeping the number of measurements bounded by a geometric series 
that converges to 0{klog N/k). As in BGPLSIOII . we present a compound loop invariant satisfied as the 
number of heavy hitters drops from A; to A;/2 to A;/4, etc. 

'in the forall model, the guarantee is that a matrix $ generated according to a specified distribution succeeds on all signals in a 
class C. In the foreach model, there is a distribution on matrices, such that for each signal x in a class C' bigger than C, a matrix 
$ chosen according to the prescribed distribution succeeds on x. The difference in models is captured in the order of quantifiers, 
which can be anthropomorphized into the powers of a challenger and adversary. 
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Table 1 : Summary of the best previous results and the result obtained in this paper. Some constant factors are omitted for clarity. 
"LP" denotes (at least) the time to do a linear program of size at least TV. The column "A/E" indicates whether the algorithm works 
in the forall (A) model or the foreach (E) model. The column "noise" indicates whether the algorithm tolerates noisy measurements. 
Measurement and decode time dependence on e, where applicable, is polynomial. 

Next, we show how to solve the transformed problem, i.e., how to reduce the number of unrecovered 
heavy hitters from k to k/2, while not increasing the noise by much. As in previous results, we estimate 
all N coefficients of x by hashing the positions into 0{k) buckets, hoping that each heavy hitter ends 
up dominating its bucket, so that the bucket aggregrate is a good estimate of the heavy hitter. We repeat 
0(log(A^/A;)) times, and take a median of estimates. Finally, we replace by zero all but the largest 0{k) 
estimates. If all estimates were independent, then this would give the result we need, by the Chemoff bound; 
below we handle the minor dependence issues. We get a simple and natural system making 0{k log N/k) 
measurements but with runtime somewhat larger than N. 

Finally, to get a sublinear time algorithm, we replace the above exhaustive search over a space of size 
with constantly-many searches over spaces of size approximately VkN = k{N /k)^^"^ . Still more generally, 
replace with l^^^^f searches over spaces of size i^^^^k{N/k)^/^, for any positive integer value of the user- 
parameter £. As a tradeoff, this requires the factor i^^^^ times more measurements. In the case £ = 2, we 
first hash the original signal's indices into VkN buckets, forming a new signal x', indexed by buckets. As 
we show, heavy hitters in x are likely to dominate their buckets, which become heavy hitters of x'. We then 
find approximately k heavy buckets exhaustively, searching a space of size VkN. Each bucket corresponds 
to approximately N/y/kN = \/N /k indices in the original signal, for a total of k-^/WJk = VkN indices, 
which are now searched. This naturally leads to runtime \^kN \og{N/k), or k{N/k)^^^ log{N/k)fori > 2. 
By absorbing log(A^/A;) into £'-'^^\N/ky/^, we get, for general £ and e. 

Theorem 1 For any positive integer £, there is a solution to the £1 forall sparse recovery problem running 
in time 0{£^e~^k{N/k)^^^) and using 0{£^e~^k\og{N /k)) measurements, where N is the length, k is the 
sparsity, and e is the approximation parameter The algorithm uses a data structure that requires space and 
preprocessing time 0{£Nk^''^ /e). 

Note: For thoroughness, we count the factors of £ and 1 /e. The reader is warned, however, that we are 
aware of possible improvements, so the reader may want to abstract £^/ and £^/e^ to the simpler expression 
(£/e)'^(i). Similarly, the power 0.2 of k in the preprocessing costs can be improved but with constant- 
factor increases to runtime or number of measurements, and we suspect that expensive preprocessing can be 
eliminated altogether. (The focus of this paper is just sublinear runtime and constant-factor-optimal number 
of measurements, while other aspects of the algorithm are reasonable but not optimal.) 
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1.4 Organization of this paper 



This paper is organized as follows. Note that we are specifying a measurement matrix and a decoding 
algorithm; we refer to the combination as a system. In Section |2j we present notation and definitions. In 
Section [3l we present our main result, in three subsections. In Section 13.11 we show how to get a Weak 
system, that recovers all but k/2 of k heavy hitters, while not increasing the noise by much. This is a 
relatively slow Weak system that illustrates several concepts, on which we build, in Section [3^ we build 
on Section im to give a sublinear time version of the Weak system. In Section [331 we show how to get a 
solution to the main problem. In Section 01 we give several open problems in connection with optimizing 
and generalizing our results. 

2 Preliminaries and Definitions 

In this section, we present notation and definitions. 

Systems. We will usually present systems for measurement and decoding as algorithmic units without all 
the details of the measurement matrix and decoding algorithm. We will then usually argue correctness at the 
system level, then argue that the system can be implemented by a matrix with the claimed number of rows 
and a decoding algorithm with the claimed runtime. 

Notation. For any vector x, we write x/j for the best A;-term approximation to x or the fc'th element of x; 
it will be clear from context. For any vector x, we write supp(x) for the support of x, i.e., {i : -x^i ^ 0}. 
Normalization. Our overall goal is to approximate x^, the best A;-term approximation to x. For the anal- 
ysis in this paper, it will be convenient to normalize x so that ||x — x^H^^ = 1. It is not necessary for the 
decoding algorithm to know the original value of ||x — Xfc||j^. 

Heavy Hitters. Suppose a signal x can be written as x = y + z, where | supp(y)| < k and ||z||^ < r]. 
Then we say that supp(y) are the (/c, r]) -heavy -hitters of x. We will frequently drop the {k, t])- when clear 
from context. Ambiguity in the decomposition x = y + z is inherent in approximate sparse problems and 
will not cause difficulty with our algorithm. 

Optimal number of measurements. For this paper, we only consider algorithms using the optimal num- 
ber of measurements, up to constant factors H 

We will use the following form of the Chemoff bound. 

Lemma 2 (Chernoff) Fix real number p, Q < p < 1. Let Xi,X2, ■ ■ ■ , Xn be a set of independent 0/1- 
valued random variables with expectation p. Let X = Xi and let ^jl = pn denote E[X]. For any 5 > 0, 
we have 



where a = (1 + 5)^. If a = Q.{n) and a > (1 + Q.{l))efi, then the above probability is p^^^\ 

Parameter summary. We use the parameter k for sparsity in the toplevel signal, but a different symbol, 
the parameter s, in subroutines, so we can say things like, "put s = k/2^ in the j'th iteration." Similarly, the 
parameters e, a, and rj are related "noise" or approximation ratio parameters in the various routines, and ( 
is an "omission" parameter, such that we guarantee to recover all but (^s heavy hitters in an s-sparse signal. 

^The optimal number of measurements, if e is considered to be a constant, is IBIPWIOI 0{k log{N/k)) = Q (log (^))- Our e 
dependence is cubic (quadratic in a warmup algorithm), which is sub-optimal compared with the quadratic dependence in the best 
algorithms. 
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3 Main Result 



3.1 Weak System 

We start with a Weak System. Intutitively, a Weak system, operating on measurements, cuts in half the 
number of unknown heavy hitters, while not increasing tail noise by much. We estimate all values in x and 
take the largest 0{k) estimates. To estimate the values, we hash all N positions into B = 0{k) buckets. 
Each position i to estimate has a 0(1) probability of getting hashed into a bucket with no (other) items 
larger than 1/k and the sum of other items not much more than the average value, 1/B ^ in which 
case the sum of values in the bucket estimates Xj to within zbl/Zc. By a concentration of measure argument 
for dependent random variables, we conclude that Q{k) measurements are good except with probability 
p = 2~^^^\ and, if we repeat t = 0{\og{N /k)) times, some ^{k) items get more than t/2 correct estimates 
except with probability = {k/N)^. In the favorable case, the median estimate is correct. The failure 
probability is small enough to take a union bound over all sets of 0{k) positions, so we conclude that no set 
of U,{k) estimates is bad, i.e., there are at most, say, k/2 failures, as desired. 

We first present an algorithm that simply estimates all [N] as suitable candidates. This makes the runtime 
slightly superlinear. Below, we will show how to get a smaller set of candidates, which speeds the algorithm 
at the cost of a controlable increase in the number of measurements. 

Definition 3 A Weak system consists of parameters N, s, B, rj, C, an m-by-N measurement matrix, and 
a decoding algorithm, D. Consider signals x that can be written x = y + z, where \ supp(y)| < s, 
supp(y) C /, and ||z||j^ < 0(1). 

Given the parameters, I, a measurement matrix and measurements $x for any x with a decom- 
position above, the decoding algorithm returns x, such that x = x + y + z„ where \ supp(x)| < 0{s), 
I supp(y)| < Cs, and ||z||^ < ||z||^ + rj. 

Without loss of generality, we may assume that supp(y) n supp(z) = supp(y) n supp(z) = 0, but, in 
general, supp(x) intersects both supp(y) and supp(z). 

The parameter B will always be set to 2s in implementations. We prove correctness for general B 
because the generality is needed to prove Lemma [6] below. 

Lemma 4 (Weak) With probability 1— (^) ^^^^ over the choice of hash functions, Algorithm\l\ with appro- 
priate instantiations of constants, is a correct Weak system that uses 0(r/~^C~^s log(A^/,s)) measurements 
when B = 0{s) and runs in time 0(|I|7y~^C~^ log(A^/s)). 

Proof. The number of measurements and runtime are as claimed by construction, so we show correctness. 
There are several parts to this, and much of this is similar to or implicit in previous work. We show that, 
with probability at least 3/4 over the choice of 

1. For any set S = supp(y) of s heavy hitters and any set D = supp(x) of s "decoys" that might 
displace S, at most 0{C,s) elements of S* U -D collide, in at least t/4 of their buckets, with an element 
of 5 U D U T, where T is the set of the top 0{s/[Qri)) elements. 

2. Let A be the set of rows of $ with a one anywhere in columns SUD and let be $ restricted to the 
rows of A. Let F be a set of Q{s/{Cr])) columns disjoint from S U D U T, and let u be an A^-vector 
such that u = l/\F\ on F and zero elsewhere (u is a "flat tail"). We have < 0{r]C,t). 
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Algorithm 1 A Weak system. 



Input: N, sparsity s, noise t], $x, hash parameter B, omission candidate set / = [N] 
Output: X 

for j = 1 to t = 0{r]-^C-'^log{N/s)/log{B/s)) do 
Hash h:[N]^ [0{r]-^C'^B)] 
for z G / do 

Xj--'^ = J2h{i')=h{i) ^i' signal values in z's hash bucket — an element of input $x 

end for 
end for 
for i G I do 

(i) 

Let X- be the median over j < t of x- 
end for 

Zero out all but the largest 0(s) elements of x'; get x 
return x 



3. Let A, For any 1/ supported off 5 UZ) with = 1 and < 0(1/| supp(r)|) = 0{r]C/s), 
we have < 0{rjC,t). 

4. There is a decomposition x = x' + y' + z' such that: 

• x' equals x' on supp(xs) and zero elsewhere, where x' is as in Algorithm [T] 

• I supp(y')| < Cs 

• z' ^ < INIIi + 0{ri). 

5. (The lemma's conclusion.) There's a decomposition x = x + y + z with |supp(x)| < 0{s), 
I supp(y)| < C,s, and ||z||-^ < ||z||-^ + rj. 

The dependence is as follows. Item|3]for general tails follows from Item|2]for flat tails. Item |4] follows 
from Items [T] and |3] and shows that the estimates lead to an acceptable decomposition of x, assuming some 
choice (generally unknown to the algorithm) of support for x, namely supp(xs). Finally, Item [5] follows 
from Item |4] by considering the displacement of an element in the support of x^ by an element in the 
Algorithm's output, i.e., the support of x. Only Items [T] and |2] involve probabilistic arguments. 



ItemlU Fix a decomposition x = y + z as above, let S equal supp(y), and let D C [N] be any set of 
s positions. (We only care about the case D = supp(x), but, to handle stochastic dependence issues, it is 
necessary to prove the result for a general D of this size.) We want to show that at most 0{C,s) elements of 
S U D collide with one of the top 0{s/{C,rj)) elements in at least t/4 of their t buckets. Let T be the set of 
top 0{s/{Cr])) elements in [N]. 

Intuitively, there are ^1{7]^^(^'^B) hash buckets and at most 0(|r|) are ever occupied by an element of 
S U D UT, so each element of S U D has at most a 0{Tr]Q'^/B) = 0{sC,/B) < 0(C) chance to colhde 
when it is hashed. As we discuss below, this implies that the expected number of collisions (at the time 
of hashing or later) is 0{sC,/B) in each of the t repetitions. If all estimates (over all i and all repetitions) 
were independent, we could apply the Chemoff bound Lemma |2j and conclude that the number of failed 

element-repetition pairs exceeds 0{C,\SU D\t) = 0{C,st) only with probability (|^|) ^^^\ small enough to 
take a union bound over all (5, D, T), which is acceptably small. But it is easy to see (and also see below) 
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that there is at least some small dependence. So instead we proceed as follows, using a form of the Method 
of Bounded Differences and coupling lDP09l lMR95l IMU051 . 

First hash the elements ofr \ {S U D). Then hash the elements of 5 U D, in some arbitrary order. Let 
Xj be the 0/1-valued random variable that takes the value 1 if the j'th element of 5 U -D is hashed into a 
bucket that is bad (occupied by an element of 5" U D U T) at the time of j's hashing. As above, each Xj has 
E[X,] < C 

Note that even if some i G S L) D is isolated at the time of its hashing, i may become clobbered by 
an element of j G S U D that is later hashed into its bucket. So Xj is not the total number of failed 
estimates. But observe that if some j is hashed into the same bucket as previously-hashed items, it can only 
clobber at most one other previously-unclobbered element i, because j is only hashed into one bucket, and 
that bucket has at most one previously-unclobbered item. It follows that 2 Xj is an upper bound on the 
number of colliding items in S U D, where, for some p, the Xj's are 0/1-valued random variables with the 
expectation of each Xj bounded by p, even conditioned on any outcomes of X^j. This is enough to get the 
conclusion of the Chernoff inequality with independent trials of failure probability p, by a standard coupling 
argument. (See, e.g., exercise 1.7 of IIDP09i .) In the standard proof of Chernoff, we have, for any A > 0, 




Pr ( e^^^^ > e 



Xa 



< E 



Pr {J\^ e^^= > e 

[n 



At this point, if the Xj's were independent, we would get the product of expectations. Instead, we proceed 
as follows, where Yj 's are independent random variables with expectation p. 



Pr 



Xi> a] < E 



[n 



E 



n 

j<n 



= ^(Pr {Xn = 0|X<„ = v)+Vt {Xn = 1|X<„ = v) e' 

V 

< ^(1 -p + pe^) Pr(X<„ = ^)e^-*'=ight(^/)/gAa 



E 



n 

j<n 



/e 



Xa 



Proceed inductively, getting 

Pr (J2 Xj > a) < ^ [JJ e^^^ /e 



Xa 



cXa 



to which the rest of the usual Chemoff-type bound applies. Thus the expected number of pairs of elements 
in SVJ D and repetition that collide is at most 0{C,st). 
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Having shown that our dependent colUsion events behave Uke independent events up to constants, we 
now go over the arithmetic, assuming independent collisions. Each i € S U D fails in each repetition 
with probability at most 0{C,s/ B) (wlog, exactly C,s/B for now). Among the (2st) pairs of i G S U D 
and repetition, we expect to get /x = 0{C,s^t/B) failed pairs, and we get at least a > (st failures with 
probability at most (e/x/a)", by Lemma[2l Chernoff. So the failure probability is 

{efi/ar = = (s/i?)^^C';-^C-^«iogW.)/iog(B/s)) ^ ^s/Nf^^-'^-''\ 

which is small enough to to take a union bound over (T, S, D). In the favorable case, there is only a fraction 
0(C) of all pairs of item and repetition with a failed estimate. It follows, after adjusting constants, that less 
than {l/2)C,s items get more that i/4 failed original estimates. The remaining (1 — C/2)s items get good 
final median estimate (even if another t/4 original estimates fail for other reasons, as we discuss below), 
since a median estimate fails only if a majority of mediand estimates fail. 

Item 121 Fix S,D,T, \F\,F, choose the S U D columns and the T columns of $ (arbitrarily for this 
discussion), and thereby define A (the rows of $ with a 1 in columns S U D) and u (equal to l/\F\ on F 
and zero elsewhere), as above. We now hash the elements of F at random, i.e., choose the F columns of 
In each repetition, there are 0{7]^^(^'^B) buckets, of which 0(s) are in A. It follows that each element in 
F hashes to A in each repetition with probability 0{rj('^s/B). Counting repetitions, there are a total of t\F\ 
elements that each hash into A with probability rjQ'^s/ B. We expect /i = ■qC,'^t\F\s / B element-repetition 
pairs of t\F\ total to hash into A and we get more than a = r/(^^t|F| with probability at most 

(e/x/a)" = {s/Bf^^^"'\^\'^ < (c,/ij)^^(l^|iogW^)/iog(B/s)) ^ (^/^r)^^!!^!), 

which is small enough to take a union bound over all S, D,T,\F\, F. Since elements of u have magnitude 
l/\F\, it follows that < a/\F\ = 0{'qCH), sci we conclude < 0{r]Ct). 

At this point, we have that, except with probability 1/4, at most 0(Cs) of 5UD items collide with SUD 
or with an element of T (of magnitude at least r]( / s) in more than t/Aof their repetitions and no flat tail of 
support size at least s/{r]C) contributes more than a constant times its expected amount, which is 0{rjC,t) if 
the magnitude of v is maximal, into the buckets A containing the top s heavy hitters. Conditioned on this 
holding, we proceed non-probabilistically. 

Item|3l Let v be any vector supported disjointly from S U D with = 1 and < 0{C,r]/s) 

as above. Since $ is non-negative, we may assume that u is non-negative, as well, by replacing u with 
Next, round each non-zero element of u up to the nearest power of 2, at most doubling u. Write 
u = Yli^i^i' where z/j takes on only the values and 2~*, and Wi is or 1. Also write u = u' + u", 
where Ui contributes to u' if the support of Ui is at least s/{r]C,) and Vi contributes to v" , otherwise. The i/j's 
contributing to v' are multiples of flat tails of the kind handled in Item|2]and their sum, v' , which has 1-norm 
at most 1, is a subconvex combination of such flat tails. Since ||$^z/||]^ is subadditive in v (actually, strictly 
additive under our non-negativity assumption), we get ||$yiz^'||]^ < 0{rjC,t). 

Now consider the sum v" of Vi with support less than s/{r](). In general, these can contribute more than 
their expected value, but not much more than the expected value, and the expected value is typically much 
less than for other flat tails. We will handle the sum of these at once (without using the convex combination 
argurment), so we may assume the supports are the maximum, s/{7]Q, by increasing each actual support 

'This seems loose by a factor ^, but local fixes, like replacing ^ with \/^, do not seem to work. We speculate that better 
dependence on ( is possible. 
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Table 2: Contributions Xj — Xj to y and to z from i G supp(y) and i G supp(z), according to whether 
i G X, whether i G supp(xj) has a good or bad estimate (i.e. whether or not the median estimate is good 
to within ±0{r]/s)), or, if i G supp(y) \ x, according to whether i was displaced by i' with a good or bad 
estimate, under an arbitrary pairing between i G supp(y) \ supp(x) and i' G supp(x) \ supp(y). Note that 
zero may be a good estimate. 





i G supp(y) 


i G supp(z) 




Good 


Bad 


Good 


Bad 




estimate 


estimate 


estimate 


estimate 


i G supp(x) 


z 


y 


z 


y 


i ^ supp(x); Displaced by bad estimate 


y 


y 


z 


z 


i ^ supp(x); Displaced by good estimate 


z 


y 


z 


z 



to a superset. Also, we may assume that the corresponding Wi's are as large as possible, i.e., Wi = 1 if 
2^* < r]C,/s and Wi = 0, otherwise (so that the maximum magnitude is r]C,/s). With these assumptions, 
each such flat tail contributes not much more than its expected number, 0{st), of elements of magnitude 
2^* = 2~^r]C/s for some j > 0. Thus H^At'illi = 0{'qC,t2~^) for i and j as above. The sum (which can be 
greater than a convex combination of the original contribution but, it turns out, is at most a constant times 
a convex combination under our assumptions) contributes < 0{rjC,tY^-^Q2~^) = 0{rjC,t), as 

desired. 

Thus ||*AZ^|li < + < 0(f?C*)- 

Item 131 Let x' be as above. In Item[T] we showed that an acceptable number 0{C,s) elements oi S \J D 
suffer collisions; here we we consider only the elements of S \J D that do not collide with S VJ D VJ T. 
So we can consider only the tail elements that are still relevant, i.e., the elements of [A^] \{S VJ D VJT), 
which have magnitude at most r/C/s- These form a tail v as described in Item|3] Consider i to be a failure 
if |x^ — Xj| > $7(r//s). Then each failed i in x' requires t/2 failed i's in x^-'^'s and, since collisions only 
account for t/4 i's in x^-^^'s, each failed i in x' that does not fail due to collisions also requires Vt{t) failed 
i's in x'^-'^'s. Thus each failed but non-colliding i accounts for Q{tr]/s) of Since H^a^^II < 0{'qC,t), 

there can be at most 0{Qs) failures, as desired. The remaining at-most-s estimates of x^ each are good to 
within 0{rj/s), additively, so the total 1-norm of the estimation errors is 0{rj), as desired. 

Item m To complete our analysis of correctness, we describe x, y, and z and show that they have the 
claimed properties. This is summarized in Table |2] 

• The pseudocode Algorithm [T] returns x, which has support size 0{s). 

• Elements i G supp(x) with a good estimate (to within ±0{r]/s)) contribute Xj — % to z. There are 
at most 0{s) of these, each contributing 0{r]/s), for total contribution 0(r/) to z. 

• Elements i G supp(x) with a bad estimate (not to within ±0(77/5)) contribute Xj — Xj to y. There 
are at most 0{Cs) of these. 

• Elements i G supp(z) \ supp(x) contribute Xj to z. The li norm of these is at most ||z||. 
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• Elements i G supp(y) \ supp(x) with a good estimate that are nevertheless displaced by another 
element i' G supp(x) \ supp(y) with a good estimate contribute to z. There are at most s of these. 
While the value x, may be large and make a large contribution to z, this is offset by Xj/ satisfying, 
for some c, |xj/| > |xj/| — crj/s > |xj| — cq/s > |xj| — 2cr]/s, which contributes to z but not to 
z. Thus the net contribution to z is at most 0{7]/s) for each of the 0{s) of these i, for a total 0{rj) 
contribution to z. 

The contributions of such i and i' are summarized in the following table, whence the reader can 
confirm that (y + z){j^j,} = (x + y + z){j^j/} and < ||z{iy}|| +0{r]/s). 





y 


z 


X 


y 


z 


i 










Xi 


i' 




Xj/ 


Xj/ 




Xj/ - Xj/ 



• Elements i G supp(y) \ supp(x) that themselves have bad estimates or are displaced by elements 
with bad estimates contribute Xj to y. There are at most C,s bad estimates overall, so there are at most 
0{Cs) of these. 

We have shown that |supp(y)| < 0{C,s) and W^Wi < ||z||^ + 0(r/). By adjusting constants in the 
algorithm, we can arrange for the conclusion of the Lemma. 



3.2 Sublinear Time 

In this section, we introduce a way to limit / to get a sublinear time Weak system. Since the runtime of 
the weak system will dominate the overall runtime, it follows that the overall algorithm will have sublinear 
time. We first give a basic algorithm with runtime approximately \/kN, then we generalize from \/kN = 
k{N/kY^'^ to £^^^^k{N/ky/^ for any positive integer i, but with number of measurements suboptimal by 
the factor ^'^(1). 

The basic idea, for £ = 2 and (ignoring for now the small effects of e that we set to 1^(1)), is as follows. 
Hash h : [N] [VkN], and repeat a total of two times. In each repetition, a heavy hitter avoids collisions 
except with probability k/VkN = y^k/N. Also, the average amount of tail noise (sum of others in the 
bucket) is l/VkN, so the tail noise exceeds 1/k on at most the fraction k/VkN = y/k/N of the buckets. 
So a heavy hitter dominates its bucket except with probability 0{\/k/N). The heavy hitter dominates in at 
least one of the two repetitions with failure probability equal to the square of that, or 0{k/N), which is what 

we would need to apply the Chemoff bound and to conclude that, except with probability (^) (which is 
small enough to take our union bound), ^{k) of the heavy hitters are isolated in low-noise buckets. There is 
some dependence here, which is handled as in Section [XTl 

Now focus on one of the two repetitions. We can form a new signal x' of length N' = VkN and 
sparsity k' = Q{k). The signal x' is indexed by hash buckets and Xj = Ylh{i)=j '•^^ values 

in X that are hashed to the same bucket. The original {N,k) signal (of lengtih N and sparsity k) and a new 
(A^', k') = {VkN, i}{k)) signal form what we call a two-level signal filtration, of which there are two, for 
the two repetitions. 

For each filtration, run the Weak system Algorithm [T] on the (A^', k') signal x', getting a set H of Q{k) 
heavy hitters. This uses 0{k' \og{N' /k')) = 0{k\og{N /k)) measurements and runtime led by the factor 
A^' = VkN. Form the set / = h^^{H) of indices to the original signal. Finally, run the Weak system on 
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the original signal, but with index set I. This also takes 0{k\og{N/k)) measurements and runtime led by 
the factor |/| = VkN. Thus the overall runtime is given by the time to make two exhaustive searches over 
spaces of size about VkN, on each of two repetitions, i.e., £ repetitions of £ exhaustive searches over spaces 
of size k{N/k)^^^, for i = 2. For correctness, we need to argue that the filtration is faithful to the original 
signal in the sense that enough heavy hitters from the original signal become heavy hitters in the {k',N') 
signals and that we can successfully track enough of these back to the original signal. 

In the general situation, £ may be greater than 2. We will have £ — 1 intermediate signals in the levels 
of the filtration, which we define below. The runtime will arise from performing £ repetitions of £ cascaded 
exhaustive searches over spaces of size about k{N/ky/^. There is strong overlap between the set of heavy 
hitters in the original signal and the set of heavy hitters in the shortest signal (of length k{N/k)^/^). As- 
suming a correspondence of heavy hitters, our task is to trace each such heavy hitter in the shortest signal 
through longer and longer signals, back to the original {N, k) signal. Unfortunately, each time we ascend 
a level, we encounter more noise, and risk losing the trail of our heavy hitter. In the case of general £, 
we will need to control noise and other losses by setting parameters as a function of £. Roughly speaking, 
we need to lose no more than about k/£ heavy hitters at each level i.e., \ supp(y)| < k/£, rather than los- 
ing, say, k/2, and (for general e) we need to increase the noise by at most 0{e/£) rather than 0(e), i.e., 
W^Wi — < 0{e/£). This is done by setting the parameter C, to l/£ and rj to 0{e/£) instead of 0(e). 
Also, the number of repetitions must increase from 0{£) to 0{£/e). 

We now proceed formally, for general number £ of levels. 

Definition 5 Fix integer parameters s, N, and £, and real ^ > 0. Given a signal x and a hash function h : 
[N\ — )■ [0{{s / ^){N / sY / ^)\, an £-level signal filtration on x is a collection of £ signals, x^^), x^^^, . . . , x^^), 
defined as follows. The signal yS'^^ has length N^'^^ = O (^{s/^){N/s)'^^^). Use the hash function h : [A^] — t- 

[A^^^-*] and define x^-^^ by x^-^^ = X]h(i)=j Then, for 1 < q < £, define x^''^^-' from x*-*^-* by splitting 
each subbucket b indexing an element o/x^*^) (i.e., a subset of [N]) into subsubbuckets, in some arbitrary, 
deterministic way. Denote by split (6) the resulting set of subsubbuckets. Then x^''"''^^ = IJ^^ split (6). Each 
subbucket is split into exactly (N/s)^^^ subsubbuckets except that buckets in x.^^~^\ which have size only 
^{N / sY/^, are split into ^{N/ sY^^ singletons, resulting in x. See Figure\J\ 

Consider a heavy index i in the original signal. It maps to a bucket, h{i). In the favorable case, i 
dominates h{i), in the sense that |xj| accounts for, say, 3/4 of the £i norm of h{i). Because the rest of 
the filtration involves only splitting buckets, it follows that i will dominate its bucket at each level of the 
filtration. For sufficiently many such i's, we therefore find the bucket the containing i in level q + I using a 
Weak algorithm, inductively assuming we had the correct bucket at level q. We first show that enough heavy 
Ts dominate their buckets. 

Lemma 6 (Filtration Hasiiing) Fix parameters N, s, £, a and let ^ = G(a). Let 

hj ■■ [N] ^ [{s/m/s)'/'] 

be 0{£/a) independent hash functions. With adjustable probability 0,(1) over the following holds. Given 
signal X, suppose x = y + z, with \ supp(y)| < s and \\z\\-^ = 1, and suppose, without loss of generality, 
that |xj| > n[a/ s) for i G supp(y). We have x = y + w + z, where, for all i G supp(y), i dominates 
some hj{i) and \xi\ > a/s, \ supp(w)| < s/6, and ||z||^ < 1 + 0{a). 

Proof. This follows directly from Lemma HI Item HI letting C be a constant, the B of Lemma |4] equal 
s{N/ sY^^, and r] of Lemma |4] equal 0(^) (which is also B(a)). Then x' of LemmalU Item|4]gives y of this 
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Figure 1: A signal filtration. Heavy hitters, denoted by bullets, are likely isolated in low-noise buckets by the 
hashing, in which case they dominate their buckets at all levels of the deterministic splitting. Algorithm [T] 
(Weak) is used to search all of x^^^. For q > 1, given a set H of heavy hitters in Algorithm [T] (Weak) is 
also used to find heavy hitters in x^'^''^^), but we search only (N/s)^/^ items of x^''"'"^) in / = Ube/f split(6) 
(indicated by dashed boxes). 



X, length N 



Random hash 



Deterministic split, l-to-(A^/s)^/^ 



Deterministic spht, l-to-(iV/s)^/^ 




x(^) = X, length N [• 



lemma (these are the surviving heavy hitters); y of Lemma |4] gives w of this lemma (these are the ruined 
heavy hitters), and the z's in the Lemmas coincide. I 
Our Sublinear Time Weak system is given in Algorithm |2l 

Lemma 7 With proper instantiations of constants, and with fixed values C, = 1/2, B = 2s, and I = [N], Al- 
gorithm\2\is a correct Weak system (Definition^. The number of measurements is 0{i^a~^s log{N/s)) and 
the runtime is 0{£^a~^s{N/s)^^^), assuming a data structure that uses preprocessing and space 0{£N/a). 

Proof. We maintain the following invariant for all q: 

Invariant 8 We have x = y + w + z, where supp(y) C |J^. I^ j, \ supp(w)| < (s/6)(l + (g — 1)/^), and 
W^Wi ^ 1 + Oi{q — Elements ofy dominate their buckets. The size of[jj Iqj is s{N / sY^^. 

The invariant holds at initialization by Lemma [6] (Filtration). This is because the elements in y can be 
assumed to be of magnitude at least a/ s and to dominate their buckets, while the filtration process preserves 
the noise ii norm. The invariant is maintained as q increases by Lemma |4] (Weak). The failure probability 
can be taken small enough so that we can take a union bound over all I < log(A^) levels times the number 
of choices in each level (addressed in the proof of Lemma|4](Weak)). 

At q = £, we have supp(y) C /. Each Iij has size 0{s), since it is the unsplit support of the output 
of Algorithm[T] so |/| = 0{s£/a). Also, |supp(w)| < s/3 and ||z||^ < 1 + 0{a). The final call to 
Algorithm [T] (Weak) recovers all but another s/6 of the support of y, which, when combined with w, gives 
at most s/2 missed heavy hitters — the vector y in the definition of a Weak system. It also contributes an 
acceptable amount 0(q) of additional noise that, with z, constitute z in the definition of a Weak system. 
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X 



(1) 



length s^-i(iV/s^i^ 



X 



(2) 



I] length s$-i(iV/s)2/^ 



X 



(3) 



length s^-i(iV/s)3/^ 



Algorithm 2 A Fast Weak system. 
Input: N, sparsity s, noise a, $x 
Global integer £ > 2 // optimize for application 
Output: X 

for j ^ 1 to t = 0{e/a) do 

Pick hash function h : [N] — )■ [A^^^^], using parameters A^, s, I input and ^ ^ Q{a). 

Implement by a hash table augmented with backpointers and threads for and enumerating preimages 

Let x*^) be the filtration of x by /i 

II track back through levels of the filtration 
for g ^ 1 to ^ - 1 do 

Call Algorithm [T] (Weak) on Iqj, x.^'^\ C ^ 1/4 noise rj a/i, sparsity s, and B = 2s, getting x 
if q < i then 

end if 
end for 

end for 

Call Algorithm [T] (Weak) on x, /, C ^ 1/6, noise rj ^ il(a), sparsity s, B = 2s, getting x 
return x 



Costs. The number of measurements and runtime is correct by construction, assuming the hash and spht 
operations can be done in constant time. This is straightforward using a hash table with appropriate pointers 
for the split operation. Such a data structure needs space 0{N) and preprocessing 0{N) for each of the 
0{i/a) repetitions, for a total of 0{£N/a). Note that the total cost, over all £ levels, is only 0{iN/a) and 
not 0{i'^N/a), since the contributions from the levels form a geometric series. 

In more detail, we first consider dependence on a and, below, on £. The number of measurements 
is proportional to a~^, since the number of repetitions is proportional to and the error parameter r] 
is proportional to a, so each call to Algorithm [U requires measurements. The bottom {q = 1) level 
takes runtime cubic in a, since there are 0{£/a) repetitions of Iij of size 0{{s / a){N / s)^/^) and the error 
parameter r/ is proportional to a. Other levels take runtime just q~^, since |/>ij| has size 0{s{N/s)^/^). 

The number of measurements depends on the eighth power of £: one factor for the number of repe- 
titions in the outer loop, one factor for the number of levels in the inner loop, £'^ for the tighter approxi- 
mation parameter rj = a/£ and £'^ for the tighter omission parameter Q = l/£, that contribute the factor 
^-2^-4 jj^g costs. The runtime of each call to Algorithm [T] is proprotional to only the first power of 
ryC^ times |/| log(A^/s). The bottom level of the filtration involves a search over / of size {s/ a){N / s)^/^ 
for a ^ e/£, while the other £ levels of the filtration search over 0{s{N / sY^^). Thus the runtime is 
0{£^a-^s{N/sY/^ \og{N/s)). 

Finally, note that, (A^/s)i/^ \og{N/s) < (A^/s)^/(^"^). By putting £o = £-l,we get 

£^{N/s)^/^log{N/s) < {io + lf{N/s)^/^\ 

which is 0{£q{N/s)^^^°), so we lose the log(A^/s) factor for sufficiently large N/s. I 
Some remarks follow. Note that both the filtration and the measurement process of Algorithm [T] involve 
hashing. While the hashing of Algorithm [J into B = ■q^^C.^'^s buckets results in B measurements in each 
of ?7~^(^~^ log(A^/s) repetitions, the hashing to create a filtration does not directly result in measurements 
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or any recovery -time object. We never make {N/ s)^/^ measurements — that would be too many — and we do 
not instantiate the upper levels of the filtration at decode time — instantiating a signal of length [N/ sY~^/^ 
would take too long. 

3.3 Toplevel System 

Finally, we give a Toplevel system. The construction here closely follows BGPLSIOI (where it was presented 
for the foreach, £2-10-^2 problem). A Toplevel system is an algorithm that solves our overall problem. 

Definition 9 An approximate sparse recovery system (briefly, a Toplevel system), consists of parameters 
N,k,e, an m-by-N measurement matrix, and a decoding algorithm. Fix a vector, x, where denotes 
the optimal k-term approximation to x. Given the parameters and $x, the system approximates x by 
X = P($x), which must satisfy ||x — x||-|^ < (1 + e) ||x,t — x||^. 

Theorem 10 (Toplevel) Fix parameters N, k, £. Algorithm\3](Toplevel) returns x satisfying 

||x - x||^ < (1 + e) ||xfc - x||^ . 

It uses 0{£^e~^klog{N/k)) measurements and runs in time 0{£^e^^k{N/k)^/^), using a data structure 
requiring 0{iNk^'^ /e) preprocessing time and storage space. 

Algorithm 3 Toplevel System 
Input: *x, A^, k, e 
Output: X 

X ^ 
/i ^ *x 

for j = 1 to Ig A; do 

Run Algorithm ID (Fast Weak) on /i with length N, sparsity s ^ k/2^ , approx'n a ^ O(e(9/10)'') 

Let x' be the result 

Let X = X + x' 

Let fj, = fi — $x' 
end for 
return x 



Proof, [sketch] Intuitively, the first iteration of Algorithm [3] transforms a measured but unknown fc-sparse 
signal with noise magnitude 1 to a measured but unknown (A:/2)-sparse signal with noise 1 + 0(e). In 
subsequent iterations, the sparsity s decreases (relaxes) from k to k/2 to k/A while the noise tolerance a 
decreases (tightens) from e to (9/10)e to (9/10)^e, etc. We save a factor 2 in the number of measurements 
because s decreases and that more than pays for an increase in number of measurements by the factor 
(10/9)2, that arises because r] decreases. Thus measurement cost is bounded by decreasing geometric series 
and so is bounded by the first term, which is the measurement cost of the first iteration. Overall error is 
the sum of a decreasing geometric series with ratio 9/10, so the overall error ||z||^ remains bounded, by 
1 + 0(e) < 2, with the given algorithm. A similar argument (with an additional wrinkle) holds for runtime. 

More formally, note that the returned vector x has 0{k) terms. There is an invariant that x = x + y + z, 
where fi is the measurement vector for y + z, | supp(y)| < k/2^ and 

10 + UJ +UJ +--- + lloj 



z||i < l + 0(e) 
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after j iterations. This is true at initialization, where y = and ||z||^ = ||x — x^H^ = 1. At termination, 
X = X + z, with ||z||^ < 1 + 0{e), since the infinite geometric series sums to 3. Maintenance of the loop 
invariant follows from correctness of the Weak algorithm. 

Using the bound on measurements for Algorithm |2l the number of measurements used by Algorithm [3] 
is proportional to 

^£^e'^{k/2^)log{N2^ /k){10/9f^ < e^e"^klog{N/k)^{50/81 + o{l)y 
j j 

= 0{i^e-'^klog{N/k)). 

Similarly, using the runtime bound for Algorithm [T] and writing s{N/sy/^ as s^-i/^TVi/^ the runtime 
of Algorithm |3]is proportional to 



since £ > 2 



Y^£^e-^{k/2^y-'^/^N^/\W/9f^ < £^e-^k{N/kf/^ Yj[('^^/^f'^^^^'^ 
j j 

< £^e-''^k{N/ky/^^^{10/9f2-'^/^' 

j 

< £^e-^kiN/ky/^^0.97^ 

j 

< 0{l^e-^k{N/kf'^). 

Finally, the storage space for hash tables in Algorithm |2] is N for each of £/a repetitions. This is 
dominated by the smallest a, which is e{9/lQif^^ = ek~^^^'^/^ > ek^^"^, giving N£k^''^/e space and 
preprocessing. For any constant real-valued c > 0, this can be improved to {1 / c)^^^^ k^ by replacing 9/10 
with 1 — c and e with ce. This will also increase the runtime and number of measurements by a constant 
factor that depends on c. I 



4 Open Problems 

In this section, we present some generalizations of our algorithm that we leave as open problems. 
Small space. Above we presented an algorithm that used superlinear space to store and to invert a hash 
function. The amount of space is partially excuseable because it can be amortized over many instances of 
the problem, i.e., many signals. It also has the advantage over a hash function that hash operations can be 
performed simply in time 0(1). It should be possible, however, to use a standard hash function instead of 
a hash table to avoid the space requirement, though the runtime will likely increase. We leave as an open 
problem a fuller treatment of these tradeoffs. 

Column Sparsity. An advantage in sparse recovery is the sparsity of the measurement matrix, Our 
matrix can easily seen to have at most {£/e)^^^^ log{N/k) log(fe) non-zeros per column, i.e., there is no 
leading factor of A:. But we have not optimized $ for column sparsity and we leave that for future work. 
Post-measurement noise. Many algorithms in the literature give, as input to the decoding algorithm, not 
$x, but $x + u, where v is an arbitrary noise vector. The algorithm's performance must degrade gracefully 
with ||z^|| (usually the 2-norm of u). It can be seen that our algorithm does tolerate substantial noise, but in 
£i norm. We leave to future work full analysis and possible improved algorithms. 
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Lower overhead in number of measurements. The approach we present produces a Toplevel system 
from a Weak system, using a filtration. It has a blowup factor of in the number of measurements over 
a weak system, where I > 1 is an integer. Thus the blowup factor in number of measurements for a 
time-V kN \og{N/ k) algorithm is at least 256, even (implausibly) ignoring all overhead and other constant 
factors. This should be improved. 

Simplify. In IINT081 . the authors take a different approach to fast algorithms. They argue that a small 
number of Fourier transforms of length in a simple algorithm that takes linear time with a DFT oracle 
will be faster in practice than an algorithm that is asymptotically sublinear. They give an algorithm, CoSaMP, 
with runtime slightly greater than A'^, under a plausible assumption about random row-submatrices of the 
Fourier matrix and a bound on the "dynamic range" of the problem, i.e. the ratio of ||x||2 to ||x — Xfc||2- 

In the spirit of that paper, it would be good to use our speedup approach under the same assumptions 
as their paper, with £ = 2. That is, ideally, we would want to double or triple the number of DFTs in 
the original CoSaMP, but reduce the length of the DFTs from to approximately \/kN. Our algorithm 
also suffers considerable overhead in converting a Weak algorithm into a Toplevel algorithm — a significant 
flaw if the goal is a simple, low-overhead algorithm — but CoSaMP has a similar iterative structure and it 
is conceivable that our Weak-to-Toplevel overhead can be combined subadditively with CoSaMP's iterative 
overhead. 
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